QSAR Modeling of Antimycobacterial Activities of N-Benzylsalicylamides and N-Benzylsalicylthioamides Derivatives against Mycobacterium kansasii CNCTC My (235/80) Using Topological Parameter
Supratim Ray*
Division of Pharmaceutical Chemistry, Dr. B C Roy College of Pharmacy and Allied Health Sciences, Bidhannagar, Durgapur, 713 206, India
Corresponding author: supratimray_in@yahoo.co.in
ABSTRACT:
The aim of the present work is to explore the utility of QSAR study on the in vitro antimycobacterial activities of N-Benzylsalicylamides and N-Benzylsalicylthioamides derivatives reported by Dolezal et al against Mycobacterium kansasii CNCTC My (235/80) using electrotopological state atom (E-state) parameter. The reported minimum inhibitory concentrations [MIC] of the compounds determined after 14 days of incubation. Different statistical tools used in this communication are stepwise regression analysis and partial least squares analysis (PLS). All the developed models indicate the importance of connecting moiety methylcarboxamido / methylthiocarboxamido group between two substituted phenyl groups. Based on internal validation (Q2), external validation (R2pred) PLS analysis was found to be the best model (Q2=0.595, R2pred=0.759).
KEYWORDS: QSAR, E-state, stepwise regression, PLS, N-Benzylsalicylamides, Benzylsalicylthioamides, Mycobacterium kansasii
INTRODUCTION:
Mycobacterium kansasii probably produce infection via an aerosol route. Although it is not known with certainty, tap water is likely a major reservoir for Mycobacterium kansasii causing human infection. The most common disease produced by of M kansasii infection is a chronic pulmonary infection that resembles pulmonary tuberculosis. However, it may also infect other organs. M kansasii infection is the second-most-common nontuberculous opportunistic mycobacterial infection associated with AIDS1 Along with these the emergence of antibiotic-resistant pathogen agents is a serious health problem worldwide today. Due to emergence of multidrug resistance of the drugs, there is an urgent need for the development of new drug candidate as well as gaining further (and deeper) knowledge of the mechanisms of action of existing (and future) active compounds. A QSAR study indicate the utility of thiobenzanilides moiety against Mycobacterium kansasii2. In this context a QSAR study was performed to the antimycobacterial activities of two moieties N-Benzylsalicylamides and N-Benzylsalicylthioamides derivatives against Mycobacterium kansasii CNCTC My (235/80).
Computational:
Electrotopological state atom (E-state) index:
Structural specificity of a drug molecule is exhibited at an atomic or fragmental level instead of the whole molecule. In the drug receptor interaction phenomenon, a portion of the molecule (pharmacophore) may play more important role than the other segments. Though basic information for constitution of topological indices are derived from the atom level (count of atoms, bonds, paths of bonds, etc.), most of the indices are applied to the whole molecule after summing up all components over the whole molecule. Thus QSAR studies at the atomic or fragmental level are justified in the present context3.
The electrotopological state atom (E-state) index developed by Hall and Kier4 is an atom level descriptor encoding both the electronic character and topological environment of each skeletal atom in a molecule. The E-state of a skeletal atom is formulated as an intrinsic value Ii plus a perturbation term DIi, arising from the electronic interaction within the molecular topological environment of each atom in the molecule.
The intrinsic value has been defined as the ratio of a measure of electronic state (Kier-Hall valence state electronegativity) to the local connectedness. The count of valence electrons which are the most reactive and involved in chemical reactions and bond formations are considered in the expression of I to encode the electronic feature. To reflect differences in electronegativity among the atoms, principal quantum number is employed in the expression of I. The topological attribute is included by using adjacency count of atom. The intrinsic value of an atom i is defined as
(1)
In Eq.
(1), N stands for principal quantum number and
and
indicate the
count of valence electrons and sigma electrons associated with the atom i in
the hydrogen suppressed graph. The intrinsic electrotopological state
calculated according to Eq. (1) produces different values of an atom in
different degrees of substitution (branching). The values are also different
for different atoms having differences in electronegativity. The intrinsic
values increase with increase in electronegativity or electron-richness and
decrease with increase in branching (substitution).
The perturbation factor for the intrinsic state of atom i is defined as
(2)
In
Eq. (2)
stands for the graph
separation factor, i.e., count of skeletal atoms in the shortest path
connecting the atoms i and j including both atoms.
Summation of intrinsic state of an atom and influence of the field is called electrotopological state of the atom.
(3)
It is a representation of molecular structure information as it varies with changes in structural features including branching, cyclicity, homologation, heteroatom variation, and changes in relative positions of different groups. The electrotopological state considers both bonded and non-bonded interactions: the bonded component depends simply on differences in electronegativity among the adjacent atoms. The non-bonded interactions may be through inductive effect across the skeleton and is a function of graph separation factor and electronegativity differences. Thus, electrotopological state represents electronic distribution information modified by both local and global topology. The information encoded in the E-state value for an atom is the electronic accessibility at that atom.
The present communication will show here the utility of E-state parameters in QSAR studies by exploring QSAR of inhibitory activity against aldose reductase enzyme of flavonoids data set reported by Dolezal et al5 using electrotopological state atom (E-state) parameters by stepwise regression.
Materials and Methods:
The Data-set and descriptors:
The in vitro antimycobacterial activities of N-Benzylsalicylamides and N-Benzylsalicylthioamides derivatives against Mycobacterium kansasii CNCTC My (235/80) were reported by Dolezal et al 5 were used as the model data-set for the present QSAR analysis (Table 1). The reported minimum inhibitory concentrations [MIC] of the compounds determined after 14 days of incubation were in μM range which was converted to mM range and then to logarithmic scale [log (103 / MIC)]. The QSAR analysis was performed using electrotopological state atom (E-state) parameter. The whole data set contain fourty four compounds and all the compounds contain 17 common atoms (excluding hydrogen). The atoms of the molecules were numbered keeping serial numbers of the common atoms same in all the compounds (as shown in Fig. 1). The electrotopological states of the 17 common atoms for all of the compounds were found out using a VISUAL BASIC program SRETSA developed partly by the author6. The program uses, as input, only the connection table in a specific format along with intrinsic state values of different atoms. To the output file thus obtained, the biological activity data were introduced to make it ready for subsequent regression analysis.
Model development:
To begin the model development process, the whole data set (n=44) was divided into training (n=33, 75% of the total number of compounds) and test (n=11, 25% of the total number of compounds) sets by k-means clustering technique7 applied on standardized descriptor matrix of the E-state parameters. QSAR models were developed using the training set compounds (optimized by Q2), and then the developed models were validated (externally) using the test set compounds. The stepwise regression and PLS were performed using statistical software MINITAB8.
Figure 1: Common atom of the molecules
Table 1: Molecular scaffolds of the compounds along with their activity
|
Compound No. |
Type of compound |
R1 |
R2 |
MIC value (μmol/L) against Mycobacterium kansasii for 14 days (C14d) |
pC14d = Log (1000/C14d) |
|
1 |
I |
H |
4-tert-but |
62.5 |
1.20412 |
|
2 |
I |
H |
3-CF3 |
16 |
1.79588 |
|
3 |
I |
5-Br |
3-Br |
32 |
1.49485 |
|
4 |
I |
5-Br |
4-Br |
32 |
1.49485 |
|
5 |
I |
3,5 Cl2 |
4-tert-but |
62.5 |
1.20412 |
|
6 |
I |
4-Cl |
4-Br |
32 |
1.49485 |
|
7 |
I |
4-CH3 |
H |
125 |
0.90309 |
|
8 |
I |
4-CH3 |
4-CH3 |
250 |
0.60206 |
|
9 |
I |
4-CH3 |
4-Cl |
125 |
0.90309 |
|
10 |
I |
4-CH3 |
4-tert-but |
32 |
1.49485 |
|
11 |
I |
4-CH3 |
3-NO2 |
62.5 |
1.20412 |
|
12 |
I |
4-OCH3 |
3-Cl |
62.5 |
1.20412 |
|
13 |
I |
3-CH3 |
H |
62.5 |
1.20412 |
|
14 |
I |
3-CH3 |
4-Cl |
62.5 |
1.20412 |
|
15 |
I |
3,5 Br2 |
4-CF3 |
62.5 |
1.20412 |
|
16 |
II |
H |
H |
1 |
3 |
|
17 |
II |
H |
4-CH3 |
0.5 |
3.30103 |
|
18 |
II |
H |
4-Cl |
1 |
3 |
|
19 |
II |
H |
4-OCH3 |
4 |
2.39794 |
|
20 |
II |
H |
3,4 Cl2 |
2 |
2.69897 |
|
21 |
II |
H |
4-F |
4 |
2.39794 |
|
22 |
II |
H |
3-CH3 |
2 |
2.69897 |
|
23 |
II |
H |
4-tert-but |
2 |
2.69897 |
|
24 |
II |
H |
3-Cl |
1 |
3 |
|
25 |
II |
H |
3-CF3 |
2 |
2.69897 |
|
26 |
II |
5-Br |
3,4 Cl2 |
16 |
1.79588 |
|
27 |
II |
5-Br |
3-Br |
8 |
2.09691 |
|
28 |
II |
5-Br |
4-Br |
8 |
2.09691 |
|
29 |
II |
5-Cl |
H |
8 |
2.09691 |
|
30 |
II |
5-Cl |
3,4 Cl2 |
32 |
1.49485 |
|
31 |
II |
5-Cl |
4-F |
8 |
2.09691 |
|
32 |
II |
3,5 Cl2 |
3,4 Cl2 |
32 |
1.49485 |
|
33 |
II |
3,5 Cl2 |
4-tert-but |
62.5 |
1.20412 |
|
34 |
II |
4-Cl |
4-Br |
8 |
2.09691 |
|
35 |
II |
4-CH3 |
H |
2 |
2.69897 |
|
36 |
II |
4-CH3 |
4-CH3 |
0.25 |
3.60206 |
|
37 |
II |
4-CH3 |
4-Cl |
0.5 |
3.30103 |
|
38 |
II |
4-CH3 |
4-tert-but |
1 |
3 |
|
39 |
II |
4-CH3 |
3-NO2 |
2 |
2.69897 |
|
40 |
II |
5-OCH3 |
H |
16 |
1.79588 |
|
41 |
II |
4-OCH3 |
H |
8 |
2.09691 |
|
42 |
II |
4-OCH3 |
3-Cl |
2 |
2.69897 |
|
43 |
II |
3-CH3 |
4-Cl |
2 |
2.69897 |
|
44 |
II |
3,5 Br2 |
4-CF3 |
62.5 |
1.20412 |
Table 2: k-Means clustering of compounds using standardized descriptors
|
Cluster No. |
No. of compounds in different clusters |
Compounds (Sl nos.) in each clusters |
||||||||||||||||||
|
1 |
19 |
16 |
17 |
18 |
19 |
22 |
23 |
24 |
27 |
28 |
29 |
34 |
35 |
36 |
37 |
38 |
40 |
41 |
42 |
43 |
|
2 |
5 |
2 |
11 |
15 |
25 |
44 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
12 |
1 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
12 |
13 |
14 |
|
|
|
|
|
|
|
|
4 |
8 |
20 |
21 |
26 |
30 |
31 |
32 |
33 |
39 |
|
|
|
|
|
|
|
|
|
|
|
Stepwise Regression:
In stepwise regression 9, a multiple term linear equation was built step-by-step. The basic procedures involve (1) identifying an initial model, (2) iteratively “stepping”, i.e., repeatedly altering the model of the previous step by adding or removing a predictor variable in accordance with the “stepping criteria”, (F = 4 for inclusion; F = 3.9 for exclusion) in our case and (3) terminating the search when stepping is no longer possible given the stepping criteria, or when a specified maximum number steps has been reached. Specifically, at each step all variables are reviewed and evaluated to determine which one will contribute most to the equation. That variable will then be included in the model, and the process started again. A limitation of the stepwise regression search approach is that it presumes that there is a single “best” subset of X variables and seeks to identify it. There is often no unique “best” subset, and all possible regression models with a similar number of X variables as in the stepwise regression solution should be fitted subsequently to study whether some other subsets of X variables might be better.
PLS:
PLS is a generalization of regression, which can handle data with strongly correlated and/or noisy or numerous X variables10-11. It gives a reduced solution, which is statistically more robust than MLR. The linear PLS model finds “new variables” (latent variables or X scores) which are linear combinations of the original variables. To avoid over fitting, a strict test for the significance of each consecutive PLS component is necessary and then stopping when the components are nonsignificant. Application of PLS thus allows the construction of larger QSAR equations while still avoiding over fitting and eliminating most variables. PLS is normally used in combination with cross validation to obtain the optimum number of components. This ensures that the QSAR equations are selected based on their ability to predict the data rather than to fit the data. In case of PLS analysis on the present data set, based on the standardized regression coefficients, the variables with smaller coefficients were removed from the PLS regression until there was no further improvement in Q2 value irrespective of the components.
Statistical qualities:
The statistical qualities of the equations were judged by the parameters such as determination coefficient (R2) and variance ratio (F) at specified degrees of freedom (df) 12. The generated QSAR equations were validated by leave-one-out cross-validation R2 (Q2) and predicted residual sum of squares (PRESS)13-14 and then were used for the prediction of antimycobacterial activity of the test set compounds. The prediction qualities of the models were judged by statistical parameters like predictive R2 (R2pred).
RESULTS AND DISCUSSION:
Membership of compounds in different clusters generated using k-means clustering technique is shown in Table 2. The test set size was set to approximately 25% to the total data set size 15 and the test set members along with their observed and calculated activity are given in Table 3. Statistical qualities of all important models are listed in Table 4. The results obtained from different statistical methods are described below and the interpretations of the equations are also depicted.
Table 3: Observed and calculated antimycobacterial activities from different models
|
Sl. No. |
Obsa (pC14d) |
Calb
|
Calc
|
|
Training set |
|||
|
1 |
1.20412 |
1.34966 |
1.641299 |
|
2 |
1.79588 |
1.164929 |
1.532992 |
|
4 |
1.49485 |
1.266241 |
1.149605 |
|
5 |
1.20412 |
0.184651 |
0.439767 |
|
6 |
1.49485 |
1.088047 |
1.450254 |
|
8 |
0.60206 |
1.457057 |
1.557383 |
|
9 |
0.90309 |
1.437628 |
1.543609 |
|
10 |
1.49485 |
1.333191 |
1.452521 |
|
13 |
1.20412 |
1.537412 |
1.449837 |
|
14 |
1.20412 |
1.464284 |
1.395341 |
|
15 |
1.20412 |
0.764165 |
0.573392 |
|
16 |
3 |
2.56732 |
2.810262 |
|
18 |
3 |
2.479918 |
2.75566 |
|
19 |
2.39794 |
2.44195 |
2.724765 |
|
20 |
2.69897 |
2.39337 |
2.696149 |
|
22 |
2.69897 |
2.478334 |
2.761125 |
|
24 |
3 |
2.466499 |
2.748544 |
|
25 |
2.69897 |
2.150228 |
2.526398 |
|
26 |
1.79588 |
2.19299 |
2.107778 |
|
27 |
2.09691 |
2.288821 |
2.17596 |
|
28 |
2.09691 |
2.29206 |
2.177609 |
|
30 |
1.49485 |
1.810865 |
1.900118 |
|
31 |
2.09691 |
1.861624 |
1.935696 |
|
33 |
1.20412 |
1.210473 |
1.47761 |
|
34 |
2.09691 |
2.11387 |
2.477003 |
|
35 |
2.69897 |
2.536584 |
2.622751 |
|
36 |
3.60206 |
2.482886 |
2.584132 |
|
38 |
3 |
2.359017 |
2.484381 |
|
39 |
2.69897 |
2.258019 |
2.429573 |
|
40 |
1.79588 |
2.03297 |
1.974014 |
|
41 |
2.09691 |
2.201251 |
2.397151 |
|
43 |
2.69897 |
2.490109 |
2.423345 |
|
44 |
1.20412 |
1.789988 |
1.611236 |
|
Test Set |
|||
|
3 |
1.49485 |
1.247917 |
1.169267 |
|
7 |
0.90309 |
1.510756 |
1.598105 |
|
11 |
1.20412 |
1.232911 |
1.397894 |
|
12 |
1.20412 |
1.174256 |
1.441306 |
|
17 |
3.30103 |
2.499351 |
2.754261 |
|
21 |
2.39794 |
2.444129 |
2.730287 |
|
23 |
2.69897 |
2.146957 |
2.913468 |
|
29 |
2.09691 |
1.970542 |
2.014651 |
|
32 |
1.49485 |
1.228361 |
1.505641 |
|
37 |
3.30103 |
2.463456 |
2.570358 |
|
42 |
2.69897 |
2.114703 |
2.337641 |
a Observed activity (ref. 5); b Calculated from eq. (1); c Calculated from eq. (2);
Table 4: Statistical comparison of different models
|
Type of statistical methods |
R2 |
Ra2 |
Q2 |
R2pred |
|
Stepwise regression |
0.647 |
0.623 |
0.509 |
0.665 |
|
PLS |
0.709 |
0.689 |
0.595 |
0.759 |
*The best values of different parameters are shown in bold.
Stepwise regression:
Using stepping criteria based on F value (F = 4.0 for inclusion; F = 3.9 for exclusion), best equation was obtained after successive addition of E-state parameters.
(1)
The standard errors of the respective E-state indices are mentioned within parentheses. Eq. (1) could explain 62.3% of the variance (adjusted coefficient of variation) and leave – one – out predicted variance was found to be 50.9%. While Eq. (1) was applied for prediction of test set compounds, the predictive R2 value for the test set was found to be 0.665. The negative coefficients of S10 indicate that activity decreases with increase in E-state value of atoms 10. Compounds with high values of E-state parameter for atom 10 showed lower activity like in compounds 1, 30. The positive coefficient of S1 indicates that activity increases with increase in E-state value of atom 1 (like compounds 17, 36, and 37). Position 1 indicates the importance of connecting moiety methylcarboxamido / methylthiocarboxamido group between two substituted phenyl groups whereas position 10 indicate the importance of amino group towards activity.
PLS:
The number of optimum components was 2 to obtain the final equation (optimized by cross validation). Based on the standardized regression coefficients, the following variables were selected for the final equation:
(2)
Eq. (2) could explain 68.9% of the variance (adjusted coefficient of variation) and leave – one – out predicted variance was found to be 59.5%. While Eq. (2) was applied for prediction of test set compounds, the predictive R2 value for the test set was found to be 0.759. The negative coefficients of S7, S9 and S12 indicate that activity decreases with increase in E-state value of atoms 7, 9 and 12 respectively. Compounds with high values of E-state parameter for atom 7 (S7) (like 5, 33) for atom 9 (S9) (like 1, 12, 15) for atom 12 (S12) (like 33, 40) showed comparatively poor activity. The positive coefficient of S1, S5, S8, and S10 indicates that activity increases with increase in E-state value of atom 1, 5, 8 and 10 respectively. Compounds with high values of E-state parameter for atom 5 (S5) (like 36, 38) for atom 8 (S8) (like 17, 36) and for atom 10 (S10) (like 36, 37) showed comparatively higher activity.
CONCLUSIONS:
The whole dataset (n=44) was divided into a training set (33 compounds) and a test set (11 compounds) based on k-means clustering of the standardized descriptor matrix and models were developed from the training set. The predictive ability of the models was judged from the prediction of the activity of the test set compounds. All the developed models indicate the importance of connecting moiety methylcarboxamido / methylthiocarboxamido group between two substituted phenyl groups. From the PLS models it has been observed that hydroxyl group is negatively contributed towards activity. The models also indicate the Positive contribution of amino group towards activity.
REFERENCES:
1. Bloch KC, Zwerling L, Pletcher MJ, Hahn JA, Gerberding JL and Ostroff SM. Incidence and clinical implications of isolation of Mycobacterium kansassi: results of a 5-year, population-based study. Annals of Internal Medicine. 129; 1998: 698– 704.
2. Kunes J, Balsanek V, Pour M, Waisser K and Kaustova J. On the relationship between the substitution pattern of thiobenzanilides and their antimycobacterial activity. Il Farmaco. 57; 2002: 777-782.
3. Hall LH, Mohney B and Kier LB. The Electrotopological State: An Atom Index for QSAR. Quantitative Structure Activity Relationship. 10; 1991: 43-51.
4. Kier LB and Hall LH. An Electrotopological State Index for Atoms in Molecules. Pharmaceutical Research. 7; 1990: 801-807.
5. Dolezal R, Waisser K, Petrlikova E, Kunes J, Kubicova L, Machacek M, Kaustova J and Martin Dahse H. N Benzylsalicylthioamides: Highly active potential Antituberculotics. Archiv der Pharmazie - Chemistry in Life Sciences. 342; 2009: 113-119.
6. SRETSA is statistical software in Visual Basic, developed by Ray S and Biswas R. and standardized using known data sets.
7. Leonard JT and Roy K. On Selection of Training and Test Sets for the Development of Predictive QSAR models. QSAR and Combinatorial Science. 25; 2006: 235-251.
8. MINITAB is statistical software of Minitab Inc, USA, http://www.minitab.com
9. Darlington RB. Regression and linear models. McGraw Hill, New York, 1990.
10. Wold S. PLS for multivariate linear modeling. In Chemometric Methods in Molecular Design (Methods and Principles in Medicinal Chemistry), Edited by Van de Waterbeemd H. VCH, Weinheim, 1995; pp. 195-218.
11. Fan Y, Shi LM, Kohn KW, Pommier Y and Weinstein JN. Quantitative structure-antitumor activity relationships of camptothecin analogs: Cluster analysis and genetic algorithm-based studies. Journal of Medicinal Chemistry. 44; 2001: 3254-3263.
12. Snedecor GW and Cochran WG. Statistical Methods, Oxford and IBH Publishing Co. Pvt. Ltd., New Delhi, 1967.
13. Debnath AK. In Combinatorial library design and evaluation: Principles, software tools, and applications in drug discovery, Edited by Ghose AK and Viswanadhan VN. Marcel Dekker, New York, 2001; pp. 73–129.
14. Roy K. On some aspects of validation of predictive QSAR models. Expert Opinion on Drug Discovery. 2; 2007: 1567-1577.
15. Roy PP, Leonard JK and Roy K. Exploring the impact of the size of training sets for the development of predictive QSAR models. Chemometrics and Intelligence Laboratory System. 90; 2008: 31-42.
Received on 28.09.2011 Modified on 20.10.2011
Accepted on 27.10.2011 © RJPT All right reserved
Research J. Pharm. and Tech. 4(12): Dec. 2011; Page 1904-1909